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Abstract 

Estimation of high dimensional covariance matrices is an interesting and important research 
topic. In this paper, we propose a dynamic structure and develop an estimation procedure 
for high dimensional covariance matrices. Asymptotic properties are derived to justify the 
estimation procedure and simulation studies are conducted to demonstrate its performance when 
the sample size is finite. By exploring a financial application, an empirical study shows that 
portfolio allocation based on dynamic high dimensional covariance matrices can signihcantly 
outperform the market from 1995 to 2014. Our proposed method also outperforms portfolio 
allocation based on the sample covariance matrix and the portfolio allocation proposed in Fan, 

Fan and Lv (2008). 
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1 Introduction 


Covariance matrix estimation is an important topic in statistics and econometrics with wide ap¬ 
plications in many disciplines, such as economics, finance and psychology. A traditional approach 
to estimating covariance matrices is based on the sample covariance matrix. However, the sample 
covariance matrix would not be a good choice when the dimension is large, and especially when the 
inverse is required, which is often the case when constructing a portfolio allocation in finance. This 
is because the estimation errors would accumulate when using the inverse of the sample covariance 
matrix to estimate the inverse of the covariance matrix. When the size of the covariance matrix is 
large, the cumulative estimation error would become unacceptable even if the estimation error of 
each entry of the covariance matrix is tiny. 

In recent years there has been various attempts to address high dimensional covariance matrix 
estimation. Usually, a sparsity condition is imposed to control the trade-off between variance and 
bias. See, Wu and Pourahmadi (2003), El Karoui (2008), Bickel and Levina (2008a, 2008b), Lam 
and Fan (2009), Fan, Liao, and Mincheva (2011), and the references therein. Fan, Fan and Lv 
(2008) considered a different approach by imposing a factor model and estimated the covariance 
matrix based on this structure. 

Most of the literature addressing high dimensional covariance matrix estimation assumes that 
the covariance matrix is constant over time. However, in many applications, covariance matrices 
are dynamic. For example, today’s optimal portfolio allocation may not be optimal tomorrow, or 
next month. Therefore, when applying the formula for Markowitz’s optimal portfolio allocation 
(Markowitz 1959), the covariance matrix used should be dynamic and allowed to change over time. 

In order to introduce a dynamic structure for covariance matrices, one cannot simply assume 
each entry of a covariance matrix is a function of time because this would not serve very well in 
prediction. Instead, we start with an approach stimulated by Fan, Fan and Lv (2008) which is 
based on the Fama-French three-factor model (Fama and French, 1992, 1993) 

yt = a + X^Si + et, ( 1 . 1 ) 

where yt is the excess return of an asset and Xt is the vector of the three factors at time t. To make 
(1.1) more flexible, we allow a to depend on the values of the three factors at time f — 1. To avoid 
the so-called ‘curse of dimensionality’, we assume this dependence is through a linear combination 
of the values of the three factors at time t — 1, which brings us to 

yt = a{X^t-iP) + xTa(X]Li/3) + e*. (1.2) 
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This motivates a dynamic structure for the covariance matrix of a random vector Yt through an 
adaptive varying coefficient model which we shall now introduce. 

Suppose (Xj, if'), t = 1, • • • , n, is a time series, where Yt is a. pn dimensional vector and Xt 
is a g dimensional factor. An underlying assumption throughout this paper is that pn —oo when 
n —)• oo, and q is fixed. Also, we assume that Xt, t = 1, • • • , n, is a stationary Markov process. 
We assume 

Tt = g(Af i/3) + + et, ||/3|| = 1, A > 0 (1.3) 

where [3 = (/3i, • • • , fiqY■, ^(A'J_^/3) is a factor loading matrix which is varying over Xf ;^/3, and 
{et, t = 1, • • • , n} are random errors which are independent of {Xt, t = 1, ■ ■ ■ , n}. We assume 

E{et\{ei : I < t}) = 0, cov(et|{e« : I < t}) = Sq,* = diag {aft, ■■■ , crl^t) 

where 

m s 

^kt = “fc.O + ^ C^k,i^‘k,t-i + ^ lk,jO'k,t-j: i = 2, • • • , n, (1.4) 

i=l j=l 

for each k = 1, ■ ■ ■ , pn and for some integers m and s. Let Ft be the a—algebra generated by 
{{Xj, ej) : I < t}. The main focus of this paper is on the conditional covariance matrix 

cov(YtlFt-i) = ^(Xj_,/3)IJ,(Xt-i)^(Xj_,/3f + Eq,* (1.5) 

where Yi^iXt-i) = cov(XtlXt-i). In (1.5), $(•), j3, S:r(-), ak,i and i = 0, ••• , m, j = 
1, • • • , s, are unknown and need to be estimated. Not only does (1.5) introduce a dynamic 
structure for cov(l)|T)_i), but also reduces the number of unknown parameters from Pn{Pn + l)/2 
to PnQ + q^ unknown functions and q + s + m + 1 unknown parameters. 

We remark that model (1.3) is interesting in its own right, since it combines single-index mod¬ 
elling (Carroll et al, 1997, Hardle et al, 1993, Yu and Ruppert, 2002, Xia and Hardle, 2006, Kong 
and Xia, 2014) and varying coefficient modelling (Fan and Zhang, 1999, 2000, Fan et al., 2003, 
Sun et al, 2007, Zhang et al, 2009, Li and Zhang, 2011, Sun et al, 2014). In this paper, as a 
by-product, an estimation procedure for (1.3) is proposed and an iterative algorithm is developed 
for implementation purposes. 

This paper is organised as follows. We begin in Section 2 with a description of the proposed 
estimation procedure for cov{Yt\Ft-i)■ A discussion on bandwidth selection is given in Section 3. 
In Section 4 we provide asymptotic properties of the estimation procedure. An iterative algorithm 
to implement the estimation procedure is suggested in Section 5. Using the proposed dynamic 
structure for covariance matrices and the developed estimation procedure, we outline a process 
for constructing a portfolio allocation based on the formula for Markowitz’s optimal portfolio in 
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Section 6. The performance of the estimation procedure and portfolio allocation are also assessed 
by simulation studies in Section 7. In Section 8, we apply the portfolio allocation methodology 
to a data set consisting of 49 industry portfolios which are freely available from Kenneth French’s 
website. We find that the proposed methodology works surprisingly well. All the detailed proofs 
are relegated to the appendix. 

2 Estimation procedure 

In this section, we are going to introduce an estimation procedure for cov(17|.Ft_i). We will first 
estimate (3, ^(•), and and denote the resulting estimators by /3, ^(•), S 3 ;(-), 

and for i = 0, • • • , m and j = 1, • • • , s. Let So,t be So,t with and 'jkj being replaced by 
and jk,j respectively. We use 

cov(K| Ji-i) = + Eo.t (2.1) 

to estimate cov(Ft|-Tt_i). 

Throughout this paper, for any function f{x), we use f{x) to denote its derivative. For any 
functional matrix F = {fij{x)), we define its derivative as F = {fij{x)). For any integers p and q, 
we use Opxq to denote a p x q matrix with each entry being 0, and Ip to denote a p-dimensional 
vector with each component being 1. 

2.1 Estimation of f3 

A Taylor expansion gives, for AJ/3 in a neighbourhood of AJ/3, 

^{Xjp) PS $(Aj/3) + Mx]( 3){X, - X,ff3 

and 

g{Xjp) PS g{x](3) + g{X]p){Xi - Xjf(3 

for j = 1, •••, n — 1. This, together with the idea of least squares estimation, brings us to the 
following local discrepancy function 

^1? ^1? ^1'! * * * ? Sn—^n—15 ^n—1; Hn—1? /3) 

n—1 n 

= EEII Fi - g, - AqXi - + BqXi){Xi_i - Xjf(3f Kh{{Xi_i - Xjf(3), (2.2) 

i=l i=2 

where: Kh{-) = K{-/h)/h, K{-) is a kernel function; /i is a bandwidth; and gj, Aj and Bj are 
used to denote g(Aj/3), g(Aj/3), $(AJ/3) and 4*(AJ/3) respectively. By minimising 

-^(Sl? ‘ ‘ ‘ Sn— ^n—l‘> ^n—l} Hn—1? /^) 
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under the conditions 

ll/3|| = l, /3i>0, 

we use the corresponding value of /3 as the estimator and denote it by /3. 

2.2 Estimation of #(■) and g( ) 

Once an estimate (3 has been obtained, the estimators of $(•) and g(-) can be constructed row by 
row through a standard univariate varying coefficient model for each component of Yf. Let 

g(-) = (5'i(-)> • • • , 9p„i-)7 , = (ai(-)> • • • , apu(-)f , • • • , ■ 

By (1.3), and for k = 1, • • • , pn, we have the following synthetic univariate varying coefficient 
model 

yk,t = gk{X^-iP) + + €kt, 

for t = 2, • • • , n. By local linear estimation for standard varying-coefficient models, and for any 
given u, we have 

kk{u) = {Ig, 0 ,x(,+ 2 )) {X^wxy" X^Wy„ gk{u) = (Oixg, 1, Oix(,+i)) {X^WXy" X^Wy^, 
where 

^ Xj 1 (Xj^ - u) (Xj^ - u)Xj ^ 

Yk = (yk, 2 , ■■■ , yk,nf, X = : : : 

W = diag (Xh, (Xjp -u), ■■■, Kh, (xy,p - u)) , 

and hi is a bandwidth. 

2.3 Estimation of Sa;( ) 

In order to estimate E{Xt\Xt-i = u) and E{XtJy\Xt-i = u), for any given u, we use the local 
constant estimators 

n 

yxtKy\\Xt_i-vi\\) 

yXt\Xt_i = u) = *=|-, (2.3) 

yKny\Xt_i-iii\\) 

t=2 

n 

yxtjyKi,y\Xt_i-vi\\) 

yXiX^AXt-i = u) = -. 

yKy\\Xt-i-A) 

t=2 
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This gives us the following estimator of Sa;(u) 


5],(u) = E{XtXj\Xt-i = u) - E{Xt\Xt-i = u) {E{Xt\Xt-i = u)}^ 

= {tr(W)}“^X'^{tr(W)W-(2.4) 

where 

X = (X 2 , • • • , w = diag(X;,,(||Xi - u||), • • • , - u||)), 

and /i 2 is a bandwidth. 


2.4 Estimation of So,t 
For each k, k = 1, • • • , pn, let 

fk,t — ^k,t — yk,t 9k{^t—l(^) ^k{^t—l^)- 


By (1.4), we have the following synthetic GARCH model 


^kt — ^ Ctk,ir‘k,t-i + ^ t — 2, • • • , n 

i=l j=l 


(2.5) 


which is equivalent to 


max(m,5) s 

rl,t = afc,o + {ak,i + lk,i)r\t-i + mt-Y ^k,i 9 k,t-j, t = 2, 

i=i 


n 


i=l 


where Tjkt = ~ ^kv 7^4 ~ ^ when i > s, and ak,i = 0 when i > m. 

Once ak,i and have been estimated, by substituting them into (2.5) and setting i 

for I < max(m, s), we can obtain an estimator of and hence an estimator Xo,t of Xq,*. 

For each k, k = 1, ■ ■ ■ , pn, let 6k = (afc,o, • • • , afc,m, 7fc,i, • • • , 7k,sV■ We are going to use 
a quasi-maximum likelihood approach to estimate 6k- We define the negative quasi log-likelihood 
function of 6k as 


Qk,n{6k) — ri ^ ^ 


k,t 


+ ^ogalt{6k) 


where CTkti^k) are recursively defined by (2.5) with initial values being either 


( 2 . 6 ) 


2 _ _2 _ 2 _ _2 _ 

^A:,0 — • • • — Tk^i-m — ^kfl — ' • • — (Jk,l-s — ^tkfi 


or 

2 _ _2 _ 2 _ _2 _2 
^kfl ' ' ' ^k,l—m ^fc,0 ' ' ' s ^k,0' 

By minimising Qk,ni6k) with respect to 6k on a compact set A defined in (B3) in Appendix A, we 
use the minimiser 6k to estimate 6k- 
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3 Bandwidth selection 


The choice of the bandwidth h, used in the estimation of f3, is not crucial. According to some 
numerical analysis not presented in this paper for brevity, the accuracy of the estimator /3 is not 
very sensitive to h, as long as h is within in a reasonable range. In the computational algorithm 
for estimating f3, see Section 5, we recommend choosing a bandwidth h equal to around 20% of the 
following range 

max{A^^, • • • , Xl^} - min{X^^, • • • , (3.1) 

where /3 is a randomly chosen initial estimate of (3. We update h on subsequent iterations by 
replacing /3 in (3.1) with the most recent estimate of /3. This approach is employed in the simulation 
studies and real data analysis of this paper. 

We now focus on the selection of the bandwidth hi, used in the estimation of g(-) and $(•). The 
proposed bandwidth selection is based on a fc-nearest neighbours bandwidth with k being selected 
by cross-validation. We define the cross-validation statistic by 

n 

CX{k)= Y. \\Yt-g^*-^\xJ_,^)-^^'-"\xtimt\\ (3.2) 

t=n—M 

where g^*~^^(-) and \-) are the respective estimators of g(.) and $(•) using a /c-nearest neigh¬ 
bours bandwidth based on {Xj, Y^), I = 1, •••, t — 1, and where M is a look-back integer 
parameter such that M < n — 1. 

Hence, denoting the k that minimises CV(/c) by k, we use a /c-nearest neighbours bandwidth in 
the estimation of g(.) and $(•). The bandwidth /12 in the estimation of Sa;(.) or E{Xt\Xt-i = u) 
can also be selected by cross-validation in a similar way. 

4 Asymptotic properties 

In this section, we are going to present the asymptotic properties of the proposed estimators. 
We first introduce the following notation which will be used throughout this paper. For any 
matrix A = {aij)mxN, we use Amin (A) and Amax(A) to denote respectively the smallest and largest 
eigenvalues of A. The trace of A is denoted by tr(A), the Frobenius norm of A by ||A||ir, and the 
spectral norm (also called operator norm) and element-wise norm by 

00 — niBX I 0-21 I 

l<i<m 
l<j<N 
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respectively. We also define 


.. n Pn 

Un =-V V f{Xjl3){Xi_, - E{Xi\Xj_,(3)}{gk{Xj_,(3) + akiXj_,P)Xi}ek,i 

and 

Pn 

Vp = Pn^Y.^ (/(xT/3){Xi - E{X2\X\I3)}^^ {gk{Xj(3) + kk{Xj(3)X2Y) . 

k=l 

Theorem 1. Under assumptions (Al - A5), (B1 - B4), (Cl) and (C3) in Appendix A, there exists 
C > 0 and a small e > 0 such that 


(I) 


P 


3 - - V-^U, 


>C h^ + 


log(n) 


nh 


< O 


l+£ 


n 


(II) 


I log(n) 


P { snp \ giz )- g { z )\\^> C]^hi + ^^ X ^ 1 > <0 


!+£■ 


n 


(III) 


P \ sup 

z£3 


$(z) - $(z) >C hl + 


/log(n) 

nhi 


< O 


!+£■ 


n 


(IV) 


P \ sup 

l<k<pn 


^k - Sk 




nhi 


l+£^ 


n 


where Z is a compact subset of the range of X"^f3. 


Remark 1. Theorem 1 shows that ||3 — (3\\ = op{n~^^'^) when pn diverges to oo as n —)• oo, 
provided that ||Un|| = op(n“^/^). It indicates that the index /3 is estimated with a rate faster than 
the normal rate which is the optimal rate if pn is fixed. This is known as a ‘blessing of high 

dimensionality’. 


The main interest of this paper is to estimate cov(lt|J^t_i). To measure the accuracy of an 
estimator M of a matrix M of size we use the entropy loss norm, proposed by James and Stein 
(1961), 


M - M 


= p -^/2 

yn 


M-1/2 _ jpfj jpf-1/2 


To facilitate our presentation, we focus on the convergence of cov(yn+i|-^n 
obtaining the data { {Xi,Yi), • • • , {Xn, Vn)}- 


coY{Yn+i\Pn), after 



Theorem 2. Under assumptions (Al - A5), (Bl - B4) and (Cl - C4) in Appendix A, there 
exist C > 0 and e > 0 such that, with probability at least 1 — 


cov{Yn+i\iFn) 


COv(y„+l|J'„)|||, < PnC 



V nhi J 


+ C 



logn 

nhi 


+ Pn^C 



logn\ 

nhl ) ' 


Fan, Fan and Lv (2008) and Fan, Liao and Mincheva (2011) showed an estimator of a covariance 
matrix based on a certain structure would achieve a higher convergence rate than the sample 
covariance matrix. Theorem 2 tells us the same story. There are three terms to measure the 
accuracy of cov(y„+i| J>i) — cov(y„+i|T^). The first two terms tell us how the nonparametric 
smoothing steps in estimating $(•) affect the performance of cov(l^_i_i|J-'n), and the third term 
evaluates the influence of conditional covariance matrix Sa;(X„). It turns out that even though 
( 7 —dimensional smoothing is required, its effect is small and often negligible if is large. 


5 Computational algorithm 

To implement the proposed estimation procedure for cov(y|Tt_i), the hardest part is to compute 
an estimate of /3, which is equivalent to finding the minimum of 

^1? ^1} * * * ? Sn—15 ^n—l‘> 1; B,^—i, /3) 


under the conditions 

ll/3|| = l, /3i>0. 

We now introduce the proposed iterative algorithm which can be used to do this minimisation. Let 

^1; -^1; -^1? ‘ ‘ ‘ 5 Sn—15 1? ^n—lj h) 

n—1 n 

= E E 11^* - gf - - Xjfpf Kh{{Xi_i - Xjfh), 

j=l i=2 

which is T(gi, ^i, Ai, Bi, ••• , g„_i, ^n-i, Bn-i, P) with the /3 in the kernel function 

being replaced by b. First of all, randomly choose an initial estimate for /3, denoted by /3, such 
that ||/3|| = 1 and the first component of P is positive. Then, iterate between the following two 
steps until convergence; 

(Step 1) If this is the first iteration, let Pq = p. Otherwise, set /3g equal to the P obtained from Step 
2 of the previous iteration. Minimise 


^1? B\, ‘ ‘ ‘ ? g^i—l? ^n—1; Byi—\, Pq) 
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with respect to gj^, Ai, Bi, • • • , g„_i, ^n-i and Bn-i, and denote the minimiser 

by gi, ^1, Ai, Bi, ■■■, g„_i, An-i and Bn-i- 

(Step 2) Minimise 

2(Sl) ^1) ^1) -^1) ■ ■ ■ ) gn—1) ^n—1) Aji—lj Bfi—i, /3, /3q) 
with respect to /3. Denote the minimiser by 0, and define /3 = /3/||/3|| when the first compo¬ 
nent of P is positive and /3 = —/3/||/3|| otherwise. 

The P resulting from the convergence is the final estimate of p. 

The proposed iterative algorithm is easy to implement as both minimisers in Step 1 and Step 2 
have a closed form. Once an estimate of P is obtained, the remaining computation of cov{Yt\Bt-i) 
becomes straightforward. 

6 Portfolio allocation 

In this section, we will briefly describe the construction of an estimated optimal portfolio allo¬ 
cation based on the proposed dynamic structure and the associated estimation procedure. Since 
the formula for optimal portfolio allocation contains E(Yt\Bt-i) we shall introduce its estimator 
E{Yt\E't-i) first. By taking conditional expectation of (1.3), we have 

E{Yt\Et-i) = g{Xj_,P) + ^{Xj_,p)E{Xt\Xt-i). 

Therefore, we use 

EiYt\Et-i) = g{Xj_,P) + ^{Xj_,P)EiXt\Xt-i) (6.1) 

to estimate E{Yt\Et-i) where E{Xt\Xt-i) is defined in (2.3). 

Our estimated optimal portfolio allocation builds on the mean-variance optimal portfolio by 
Markowitz (1952, 1959). The allocation vector w of pn risky assets, to be held between times t — 1 
and t, is defined as the solution to 

min w"^cov(Ft|J^t_i)w 

W 

subject to = 1 and E(Yt\Et-i) = 6 

where 5 is the target return imposed on the portfolio. The solution w is given by 

w = ^^^^c5v{Yt\Et-i)-%„ + ^^^^^c5v{Yt\Et-i)-^E{Yt\Et-i), (6.2) 

C1C3 - C2 C1C3 - C2 

where 

Cl = ll^cdv{Yt\Et-i)-%^, C2 = ll^c5v{Yt\Et-i)-^E{Yt\Et-i), 

C3 = E{Yt\Et-if(^v{Yt\Et-i)-^E{Yt\Et-i). 
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7 Simulation studies 


In this section, we are going to use a simulated example to show how well the proposed estimation 
procedure and portfolio allocation works. We shall use ajj(-) to denote the entry corresponding to 
the ith row and jth column of ^(•). 

We generate 1000 data sets from model (1.3) together with (1.4). We repeat this using the 
following combinations of n and {n = 1000, Pn = 50}, {n = 1000, pn = 100}, {n = 2000, pn = 
50} and {n = 2000, pn = 100}. We set 

q = A, m = l, s = l, = ^(1, 2, 0, 2)'^. 

For k = 1, ■ ■ ■ , Pn, we set 

ao,fc = 0.5, apfc = 0.1, = 0.1, 5 ^( 2 ;) = + 3 exp(- 2 :^), ak,i{z) = + 0.8z, 


ak, 2 {z) = ak, 3 {z) = '^ 3 ,k + 1.5sin(7r2;), ak,4{z) = '^4,k, 

where j. are some fixed parameters for j = 0, ■ ■ ■ , d and k = 1, ■ ■ ■ , pn- In order to define 
we simulate them independently from a uniform distribution on [—1, 1], and use these same 
values throughout all simulations. For t = 1, ■ ■ ■ , n + 1, we generate Xt independently from a 
uniform distribution on [—1,1]”?, Zt from p„-variate standard normal distribution, and €t through 
et = Zt- Once both Xt and have been generated, 1} can be generated through (1.3) for 
t = 1, • • • , n + 1. 

We will initially pretend that {X^j^-^, l^+i) is unknown to us, and this will not be used in the 
estimation of cov(l}i+i|+„). The purpose of generating an additional data point {X^_^_-t, h}}+i) is 
to enable us to calculate the 1-period simple return 

R[w) = w^y„+i (7.1) 


of a portfolio allocation w formed at time n based on data {Xj, YJ), t = 1, ■ ■ ■ , n. In order to 
evaluate the performance of an estimator M of matrix M we use the following metric 

A(M, M) = " 


We also use the Sharpe ratio 


SR(W) = 


M 


^{^(w)} 


SB{R{w)} 

to evaluate the performance of w, where SD {i?(w)} is the standard deviation of i?(w). We assume 
a zero risk-free rate for simplicity. 
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We first examine how well the estimation procedure works. We estimate cov('W_|_i|and 
use cov(W_|_i|J^„)“^ to estimate cov('W+i|The kernel function in the estimation procedure 
is taken to be the Epanechnikov kernel K{u) = 0.75(1 — u?)+, and the bandwidths are selected 
by the methodology described in Section 3. The results, presented in Tables 1 and 2, show both 
c5v{Yn+i\Fn) and cSv{Yn+i\Fn)~^ work very well. 

Table 1: Mean and Standard Deviation of A (cov(W+i|-^n); cov(W+i|-^n)) 



n = 1000 

n = 1000 

n = 2000 

n = 2000 


Pn = 50 

Pn = 100 

Pn = 50 

Pn = 100 

E{D) 

0.183 

0.189 

0.136 

0.141 

SD(D) 

0.046 

0.049 

0.034 

0.035 


In this table, D = A {cov{Yn+i\iFn), cov(yn+i|-^n)); o,nd SD(D) is the stan¬ 
dard deviation of D. 


Table 2: Mean and Standard Deviation of A (cOv(W+l|Jn) \ COY{Yn+l\IFn) 



n = 1000 

n = 1000 

n = 2000 

n = 2000 


Pn = 50 

Pn = 100 

Pn = 50 

Pn = 100 

EiDi) 

0.114 

0.105 

0.078 

0.070 

SD(Di) 

0.017 

0.013 

0.012 

0.009 


In this table, Di = A (cov(W+i|J'„) ^ coY{Yn+i\Fn) and SD(Di) is 
the standard deviation of Di. 

We now examine the performance of the proposed portfolio allocation, using a target return 
6 = 1%, by computing the return as described in (7.1). In order to see how much gain can be made 
by making use of the dynamic structure, we make a comparison with portfolio allocations based 
on Markowitz’s formula but where the covariance matrix is estimated using the sample covariance 
matrix and also the estimator proposed by Fan, Fan and Lv (2008). The mean, standard deviation 
and Sharpe ratio of the returns are presented in Table 3. For each situation discussed, we see 
the Sharpe ratio of the proposed portfolio allocation is much bigger than the other two portfolio 
allocations. This suggests there is significant gain from making use of the dynamic structure of the 
covariance matrix. 
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Table 3: Means, Standard Deviations and Sharpe Ratios 



n = 1000 

Pn = 50 

n = 1000 

Pn = 100 

n = 2000 

Pn = 50 

n = 2000 

Pn = 100 

E{R(w)} 

0.99% 

1.01% 

1.03% 

1.03% 

E{R(Wi)} 

0.96% 

0.96% 

1.02% 

1.02% 

E{R(w2)} 

0.96% 

0.96% 

1.02% 

1.02% 

SD{R(w)} 

0.40% 

0.28% 

0.39% 

0.27% 

SD {i?(wi)} 

1.02% 

1.03% 

1.03% 

1.02% 

SD {i?(w2)} 

0.99% 

0.97% 

1.02% 

1.00% 

SR(w) 

2.49 

3.57 

2.63 

3.83 

SR(wi) 

0.94 

0.93 

0.99 

1.00 

SR(w2) 

0.97 

0.99 

1.00 

1.02 


In this table we denote the proposed portfolio allocation by w, the portfolio 
allocation formed by Markowitz’s formula using the sample covariance ma¬ 
trix by wi, and the portfolio allocation formed by Markowitz’s formula using 
the estimated covariance matrix from Fan, Fan and Lv (2008) by W 2 . 


8 Real data analysis 

In this section, we are going to apply the dynamic structure for covariance matrices to a real data 
set. We use the term Face (Factor model with an Adaptive-varying-coefficient-model structure 
Covariance matrix Estimator) to denote the proposed portfolio allocation. This name was chosen 
because the estimator will ‘face’ the markets today based on what happened yesterday and adapt 
according to the dynamic structure. We compare Face with the allocation based on the sample 
covariance matrix (denoted by Sam), and the allocation proposed by Fan, Fan and Lv (2008) 
(denoted by Fan). In all three cases, we use the same target return S = 1%. We also make a com¬ 
parison with the market portfolio (denoted by Market) since this aids as an important benchmark 
indicating whether we are in a bull or bear market. In this section, the kernel function used in the 
construction of Face is still taken to be the Epanechnikov kernel, and the bandwidths are selected 
by the method described in Section 3. 

All data used can be freely downloaded from Kenneth French’s website http://mba.tuck. 
dartmouth.edu/pages/faculty/ken.french/data_library.html and was accessed on 2nd April 
2015. The response variable F) is chosen to be the vector of the daily returns of pn = 49 industry 
portfolios (value weighted) minus the risk-free rate. The observable factors xiy, X 2 ,t and xsy are 
taken to be the market, size and value factors respectively from the Fama-French three-factor model. 
The labelling along with a brief description of Yt = (yiy, • • • ,^ 49 ,*)"*" and Xt = (xiy, X 2 ,t, 3 ^ 3 , 4 )"*" 
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can be found in Table 4 and Table 5 respectively. 

There are various advantages of using the portfolio returns for ^ as opposed to using individual 
stocks: we avoid having to merge different sources of data; we avoid survivorship bias (where we 
only picked companies that did not go bankrupt); and we attempt to avoid company specific risk. 
A further benefit is that the results we give are entirely reproducible since the data is free and 
presented in a spreadsheet format. 

To have a better idea about what the data is like, we plot the observations from 3rd January 
1995 to 31st December 2014 of the three factors and the risk-free rate in Figure 1, and the first four 
components of Y) in Figure 2 corresponding to the industrial sectors Agriculture, Food Products, 
Candy & Soda, and Beer &: Liquor. The plots show clearly that there are periods of large volatility 
around the 2008-2009 financial crisis. We will see Face performs reasonably well even during that 
period, whilst the others do not. 

We compare the three portfolio allocations, (Face, Sam and Fan), along with the market portfo¬ 
lio, year by year from 1995 to 2014 using a simple trading strategy. For each year we trade on each 
trading day, which is approximately T = 252 trading days per year. At the beginning of each year 
we assume we have an initial balance of 100 pounds. Although this initial choice is arbitrary, it is 
a useful way of comparing the performance during the course of a year. We assume no transaction 
costs, allow for short selling, and assume that all possible portfolio allocations are attainable. Our 
trading strategy consists of forming a portfolio allocation w the end of each trading day and holding 
it until the end of the next trading day. Between day t — l and day t, we obtain the portfolio return 

J?t(w) = 


where w is formed based on J = !> ''' > for some look-back integer n. With the 

realised returns i?((w), t = 1, • • • , T, we can calculate the annualized Sharpe ratio 


SR(w) 


R(w) 

SD{R) 


Vt, 


where 


R(w) = i ^ {i?i(w) - , SD{R) = 


t=i 


1 1/2 


^{Rtiw) - Rf^t - R{w)y 


t=i 


and Rf^t is the risk-free rate on day t. Hence, for each year, and for each of the four trading 
strategies, we compute an annualized Sharpe ratio and the balance at the end of the final trading 
day of the year. We repeat this using n = 100, 300, and 500. From the the annualized Sharpe 
ratios presented in Figure 4 and the balances in Table 6, it is clear that Face performs significantly 
better than the other three. 
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We remark that although Face, Sam and Fan are all constructed based on Markowitz’s formula, 
the difference between them lies in the way to estimate the covariance matrix of returns, which 
appears in Markowitz’s formula. Both Sam and Fan do not take into account the dynamic feature 
of the covariance matrix in their estimation, but Face does. This is the fundamental reason why 
Face performs significantly better than Sam and Fan. One may argue that if Sam and Fan used 
fewer observations in their moving window to estimate the covariance matrix they would start to 
take the dynamic feature into account, potentially improving their performance. However when 
constructing Face, Sam and Fan, we tried a variety of n, ranging from 100 to 500, and found Face 
always performs better. This suggests that even if Sam and Fan only use the observations in a 
carefully chosen moving window. Face still outperforms them. 

To have a tangible idea about whether the covariance matrix is dynamic or not, we plot the 
estimated intercept and coefficients of xiy, X 2 ,t and xs^t, interpreted as the impact of the factors, 
for each of the first four components of Yt in Figure 3. One can see that these coefficients are 
dynamic rather than constant, which implies the covariance matrix is also dynamic. 

It is interesting to have a closer look at the performances of the four strategies in the volatile 
time period 2007-2009 during which the financial crisis took place. Still assuming an initial balance 
of 100 pounds at the start of each year, and using n = 500, we plot the balances at the end of each 
trading day in Figure 5. During 2007, Face, Sam and Fan all perform reasonably well, with Face 
slightly better. The market does not make much profit, and is beaten by the other three. In 2008, 
Face continuously does well whilst the other three do not make profit at all. In 2009, although 
Face does not do very well during some time periods, it adapts to the market change quickly and 
almost breaks even. The reason that Face can adapt to market change quickly is because it takes 
into account the dynamic feature of the covariance matrix of returns. On the other hand, both 
Sam and Fan do very poorly, and in fact they almost lose all their money at the end of the year. 
In 2009, the market performs best, but still with very little profit. 


15 



Table 4: Description of the 49 industry portfolios 


k 

yk,t 

Industry name 

k 

yk,t 

Industry name 

1 

Agric 

Agriculture 

26 

Guns 

Defense 

2 

Food 

Food Products 

27 

Gold 

Precious Metals 

3 

Soda 

Candy & Soda 

28 

Mines 

Industrial Metal Mining 

4 

Beer 

Beer & Liquor 

29 

Coal 

Coal 

5 

Smoke 

Tobacco Products 

30 

Oil 

Petroleum and Natural Gas 

6 

Toys 

Recreation 

31 

Util 

Utilities 

7 

Fun 

Entertainment 

32 

Telcm 

Communication 

8 

Books 

Printing and Publishing 

33 

PerSv 

Personal Services 

9 

Hshld 

Consumer Goods 

34 

BusSv 

Business Services 

10 

Clths 

Apparel 

35 

Hardw 

Computers 

11 

Hlth 

Healthcare 

36 

Softw 

Computer Software 

12 

MedEq 

Medical Equipment 

37 

Chips 

Electronic Equipment 

13 

Drugs 

Pharmaceutical Products 

38 

LabEq 

Measuring and Control Equipment 

14 

Chems 

Chemicals 

39 

Paper 

Business Supplies 

15 

Rubbr 

Rubber and Plastic Products 

40 

Boxes 

Shipping Containers 

16 

Txtls 

Textiles 

41 

Trans 

Transportation 

17 

BldMt 

Construction Materials 

42 

Whlsl 

Wholesale 

18 

Cnstr 

Construction 

43 

Rtail 

Retail 

19 

Steel 

Steel Works Etc 

44 

Meals 

Restaurants, Hotels, Motels 

20 

FabPr 

Fabricated Products 

45 

Banks 

Banking 

21 

Mach 

Machinery 

46 

Insur 

Insurance 

22 

ElcEq 

Electrical Equipment 

47 

RlEst 

Real Estate 

23 

Autos 

Automobiles and Trucks 

48 

Fin 

Trading 

24 

Aero 

Aircraft 

49 

Other 

Almost Nothing 

25 

Ships 

Shipbuilding, Railroad Equipment 





This table gives the labelling and a brief description of industrial sectors 
which form the 49 Industry Portfolios data set. Precise details of their con¬ 
struction are given on Kenneth French’s website. 


Table 5: Description of the Fama and French factors 


j 

Name of Xj^t 

Description 

1 

Market factor 

Return on the market minus the risk-free rate 

2 

Size factor 

Excess returns of small caps over big caps 

3 

Value factor 

Excess returns of value stocks over growth stocks 


This table gives the labelling and a brief description of market, size and 
value factors from the Fama-French factors data set. Precise details of their 
construction are given on Kenneth French’s website. 
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Figure 3: Estimated coefficient functions for industry portfolios 1-4 
Intercept Impact of Market Factor 



- 1.0 - 0.5 0.0 0.5 1.0 - 1.0 - 0.5 0.0 0.5 1.0 



- 1.0 - 0.5 0.0 0.5 1.0 - 1.0 - 0.5 0.0 0.5 1.0 



This figure shows the estimated intercept and coefficient functions for the 
market, size and value factors, for the first four industry portfolios (Agri¬ 
culture, Food Products, Candy & Soda, and Beer & Liquor) on the first day 
of trading. 
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Figure 4: Annualized Sharpe Ratios 
n = l00 



1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 

n = 500 



-e- Face -- Sam . + Fan -x- Market 


This figure shows the performance of the four trading strategies (Face, Sam, 
Fan and Market) in terms of the annualized Sharpe ratio, using different 
sample sizes n = 100, n = 300 and n = 500. 
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Figure 5: Trading strategies during the financial crisis 

2007 



02 / 01/07 01 / 03/07 01 / 05/07 02 / 07/07 04 / 09/07 01 / 11/07 31 / 12/07 

2008 



01 / 01/08 03 / 03/08 01 / 05/08 01 / 07/08 02 / 09/08 03 / 11/08 31 / 12/08 


2009 



01 / 01/09 02 / 03/09 01 / 05/09 01 / 07/09 01 / 09/09 02 / 11/09 31 / 12/09 


Face - Sam Fan - Market 


This figure shows the performance of the four trading strategies (Face, Sam, 
Fan and Market) using n = 500 during 2007, 2008 and 2009 in terms of the 
end of day balances, assuming an initial balance of 100 pounds at the start 
of each year. 
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Table 6: Comparison of Balances of Trading Strategies 


Year 

Market 

Face 

n = 100 
Sam 

Fan 

Face 

n = 300 
Sam 

Fan 

Face 

n = 500 
Sam 

Fan 

1995 

137 

224 

164 

216 

541 

277 

347 

423 

380 

466 

1996 

121 

159 

101 

96 

184 

56 

72 

212 

95 

115 

1997 

131 

179 

138 

155 

303 

146 

207 

230 

98 

127 

1998 

124 

178 

79 

134 

317 

330 

299 

442 

340 

273 

1999 

126 

121 

61 

78 

260 

117 

175 

329 

116 

135 

2000 

88 

176 

102 

133 

253 

155 

120 

160 

54 

42 

2001 

89 

129 

53 

60 

167 

49 

49 

140 

10 

6 

2002 

79 

164 

73 

69 

222 

150 

142 

196 

212 

176 

2003 

132 

161 

57 

97 

134 

40 

45 

271 

53 

75 

2004 

112 

112 

67 

95 

132 

55 

56 

180 

75 

63 

2005 

106 

179 

194 

166 

184 

157 

151 

265 

295 

239 

2006 

115 

149 

119 

121 

184 

114 

95 

150 

103 

76 

2007 

106 

233 

185 

231 

376 

305 

321 

521 

440 

537 

2008 

63 

143 

73 

104 

203 

79 

114 

361 

37 

32 

2009 

128 

147 

48 

66 

188 

9 

5 

93 

4 

3 

2010 

117 

129 

109 

100 

107 

169 

148 

152 

220 

140 

2011 

100 

177 

107 

93 

192 

88 

120 

283 

127 

154 

2012 

116 

158 

117 

96 

122 

60 

83 

144 

71 

68 

2013 

135 

232 

200 

226 

412 

180 

275 

389 

225 

363 

2014 

112 

158 

133 

134 

152 

114 

131 

162 

114 

178 


In this table, the first two columns show the year and the balance on the 
final trading day when investing in the market portfolio. The balances on 
the final trading day for Face, Sam and Fan are grouped according ton = 100 
(columns 3-5), n = 300 (columns 6-8) and n = 500 (columns 9-11). 


APPENDIX 

Appendix A: Regularity conditions 

We state the following assumptions. 

Assumption Al. (i) {Xt}t>i is stationary and ergodic; (ii) and {Xt}t>i are independent; 

(hi) X[s are bounded with support X, that is, supj>i l|Xt||oo < L,a.s. 

Let P{A) be the probability of a measurable set A and E{X) be the expectation of a random 
variable X. The following strong mixing condition (A2) aims at conducting asymptotic properties 
of the index estimator and local linear estimators of nonparametric functions. Let and 

be the a—algebras generated by {Xt, t < 0} and {Xt, t>T}, respectively and define the a—mixing 
coefficient 

a{k) = sup \P{A)P{B) - P{AB)\. 

Assumption A2. There exist positive constants c and 0 < p < 1 such that for all /c = 1, 2, • • • , 

a{k) < cp~^. 
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Assumption A3, (i) The kernel function K{z) is a symmetric density function which is bounded 
with a bounded support and satisfies the Lipschitz condition; (ii) The density function f\j{z) 
of A^b is twice differentiable and bounded away from zero on {z = x"’"b; x G A, ||b—/3||2 < Co} 
with 0 < Co < 1; (hi) The density function /(x) of Xt is bounded away from zero and twice 
differentiable in X and the joint densities of Xi and X^ for all A: > 2 are bounded. 

Assumption A4. g( 2 ;) and ^( 2 :) have continuous third derivatives m. Z = {z : z = x'^/J, x G X}. 

Assumption A5. || Vp — V|| = o(l), as pn —?• 00 , for some q x q symmetric positive definite V such 
that Aniin(V) is bounded away from zero. 

For the error process {€t,t > 1}, the following assumptions are stated. Denote the true value 
= (a£,0G • • for £ = 1, - • • ,p„. 

Assumption Bl. For each i = I-- - ,pn, = 0,±1,±2,---} is a strictly stationary 

GARCH(m, s) process with sup]^<^<p^ < 00 with d > 4. 

Assumption B2. Let for each t and Then, for each i = 1, - ■ ■ ,pn, the innovations 

Tji/s are i.i.d. and absolutely continuous with Lebesgue density being strictly positive in a 
neighbourhood of zero. Furthermore, = 0, = 1 and sup£<p^ E{ri‘f\) < 00 with d 

defined in Assumption (Bl). 

Assumption B3. For each (. = 1, - ■ ■ ,pn, the true value 0^^ is an interior point of the compact set 
A and A C (c, + 00 ) x (c, + 00 )"^+^ for a constant c > 0. 

Assumption B4. Let Atfi{z) = and Bifi{z) = 1 - fo^ ^ = Ig" ^Pn- If 

s > 0, 0 ( 2 ^) and Bip^^^{z) have no common roots, A 0; and + 7 L 0 s A 0- 

For the bandwidths L, Li, /i 2 and the dimension we require the following assumptions. 

Assumption Cl. (i) The bandwidth h and hi satisfy h = 0{n~'^) and hi = 0{n~'^A-: respectively, 
with 1/6 < r, Ti < 1/4. 

Assumption C2. The bandwidth h 2 satisfies /12 = 0{n~'^A with l/(2g + 4) < r 2 < 1/(29 + 2). 

Assumption C3. The dimension pn satisfies pn < for some constants C > 0 and 

0 < 2e < d/2 - 2. 

Our aim is to estimate cov(Lj|d-j_i). Fan, Fan and Lv (2008) and Fan, Liao and Mincheva 
(2013) showed that by incorporating the factor structure into the covariance matrix, the resulting 
estimator has a better convergence rate than the usual sample covariance matrix under the norm 
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II • lls- To prove the convergence rate of cov{Yt\J't-i) — cov(lt|Tt_i) under the norm || • ||s, we 
impose the following assumption: 

Assumption C4. For each x G A, ||p“^{$(x"'"/3)}'^$(x"'"/3) —V 2 II = o(l), as pn ^ 00 for some qx q 
symmetric positive definite V 2 such that Amin(V 2 ) is bounded away from zero. 

The assumptions are regular. The strong mixing condition in the Assumption (A2) can be 
relaxed as a{k) < ck~^ with a large constant /?. Assumption (Bl) and (B2) guarantee the existence 
of the 2d—th moment of e^^i. For simplicity, we do not impose the conditions that ensure the 
finiteness of the d—th moment of cj|^. For more details, see Lindner (2009). Assumption (C4) 
requires that the factors should be pervasive, that is, impact every individual time series. It was 
also imposed in Fan, Fan and Lv (2008) and Fan, Liao and Mincheva (2011). 

Appendix B: Proof of Theorem 1 (I)-(III) 


For ease of presentation, we give some notation. Define 


= ||b -/3||,(lin = 


log(n) 

nh 


1/2 


I S2n — 


log(n) 


n 


1/2 


) ^3n — 


log(n) 

nhi 


1/2 


and 6n = + h?din + Define 0 to be a compact set {b : ||b — (3\\ < cq, ||b|| = 1} with 

a small cq > 0. For a random sequence an, an = Oa.sX^n) for some sequence bn means that 
Plllonll > Cbn} = where e is defined in Assumption (C3). 

To prove Theorem 1, the following lemma is useful. 

Lemma B.l. Assume that Conditions (Al)-(A3) and (C3) in Appendix A hold and for some 
d > 4, 


sup < 00 , 

l<e<Pn 


where d is defined in (C3). Then there exists a constant C > 0 such that 

-^Kh{Xjh-:>^h)€e} / 1 

71 < ^ 


P < sup sup 

(b,x)e(©,A’) 


t=i 


> C5in ) <0 


1+6^ 


The proof of Lemma B.l can be followed from the proof of Lemma 6.1 in Fan and Yao (2003). 
Of course, some constants involved in the proof need to be modified. For instance, we instead use 
Bn = {nhY/‘^{\og{n))~‘^. 

Denote Y = {Y 2 , • • • , Yn), Wh{z; b) = diag {Ar;i(A[b - z), • • • , Kh{XX^_ih - z)] and 

I 1 at A^b-z (ATb-z)AT \ 


A(z;b) = 


^ AT(z;b) ^ 
V ^(^;b) y 


1 XX A^_ib-z {Xl_Xo-z)XX ] 
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Let H = diag(lix(g+i),/ilix(q+i)) and denote hh{z;h) = H ^{X{z]h)YWh{z;h)X{z-,h)H ^ 
Denote ^2 = / u^K{u)du, tJ.i,{z) = E{X\X^h = z) and, for £ = 1, • • • = (e£, 2 , • • • , ei^nV, 

f£(2:;b) = |D/i(2;;b)| H-^X^{z;h)Wh{z;h)yf,Te{z) = {ge{z),{^i{z)f',ge{z),{^i{z)ff 

^eiz) = ige(z), i^eiz))^,geiz), i^eiz))^f ,rg(z) = >Oix(q+i)f • 

The following lemma gives the asymptotic representation of Tii{z). 

Lemma B.2. Suppose that Assumption (Al)-(A4) in Appendix A hold. Then we have that 

HVi{z- b) = HVi{z) + {hh{z; h)]-^H-^X^{z; h)Wh{z; b)^ + HV[{z) (/ib(^)f (/9 - b) 

+ (z) + Oa.s.{h6-\a + 5ln<^b + '^b + ^n) ■ 

Proof of Lemma B.2. For i = 2, • • • ,n, denote Zi = Xf_il3 and Zb,i = A^j^b. Using a 
Taylor’s expansion, we obtain that 


yt,i = gdzi) + ^eizi)Xi + ee,i = xj(z- b)r£(2;) + 

where • = ^^{z)T'l{z){zi - 2:b,i),rfb,i = 2 “^^^i(^)r^'( 2 ;)(zb,i - 2 :)^ 
^?h,i = ‘^~^Xli{z)r'l{z){zi - ^b,^)^rJ^b,^ = 0{\zi - z|3). 


For /c = 1, • • • , 4, denote r^^j^ = 2 ’''' ’ rl ^ Then 


Hf^{z-h) - HTi{z) = [^h{z-,h)^ 


£,b,2’ ’ £,b,n 

H-^X^{z-h)Wh{z-,h){ 


e, + + r(2) + 

^ u,b u,b u,b u,b 


(I). Consider the term ^h{z] b). Following the proof of Theorem 5.3 in Fan and Yao (2003), we 
have that there exists a large C > 0 such that 


P 


sup 

{b,2:)6©x.E 


2;;b) - F; Q.h{z]h 


> C5in > < O ^ 



Let Q{z] b) = lim„^oo n ^Qh{z] b)|. Note that n ^Eidh{z] b) = id{z; b) + 0{h) and VL{z] b) i 
positive definite. Therefore, Qh{z',h) is positive definite almost surely and 


IS 


n ^hh{z-,h) = ^}{z-,h) + Oa.s.{h +5in). 

(II). Consider the term {z;h)Wh{z;h)Y^^\ (A: = I,-- - ,4). By specific matrix calcula¬ 

tions, we can show that 

H-^X^{z- h)Wh{z-, b)r;,^] = D(z; h)HV'^{z) {^lY>{z)f {P - b) + Oa.s. {h5^ + <5in<5b) , 
if"^A'’"(2;;b)IT,j(2:;b)r[j^^ = ]^gi^h'^9.(^z]h)HV'^{z) + Oa.s.{h^ + , 

H-^X^{z-,h)Wh{z-,h)v^^l = Oa.s.{ 5 l): X^ {z;h)Wh{z;h)v^^l = Oa.s.{^^ + + hHy, + hSl) 
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Combining (I) and (II), we obtain that 


HV^{z;h) = + ^H-^X^{z■MWh{z■,h)h + HT',{z){^l^{z)f{|3-h) 

2 !^'^h'^HT”{ z) + Oa.s.{h5\y + dln^b + '^b “I" ^n)- 

This completes the proof. 

The following lemma, Lemma B.3, gives the asymptotic relationship between and /3^, 

where is the mth step estimator based on our procedure in Section 2. 

Without loss of generality, we consider m = 1. For each i, j = 1, ■ ■ ■ ,n — 1, define 

Wj = W - Xj,w^jih) = h-^K {xlh/h} . 

Given /3;^, for j = 1, • • • , n — 1, denote zj = X^Pi and 

fi = = YWh{zf3i)X{zf3i) {x^{zfA)Wh{zf3i)X{zf3i)}~^ • 

and 

^ n—1 

= -2-E-^‘Wife+»jV.+,iiv,(3,). 

Pn . . 1 

1 rj-i 

Un = ^^ Wj |t*+i - fjW(%;3i)|wij(3i), 

(^2 — /3i + U„. 

Lemma B.3. Suppose that Conditions (Al)-(A4), (B1)-(B4), (Cl) and (C3) in Appendix A 
hold. Then, we have 

32 - /3 = ^ (3l - /3) + + R„, (A.l) 

where R„ = Oa.s. {h52n + h~^^ 2 n + ^n + h~^S 2 nS-^^ + h6-^^ +h~^5'^ ^ . 

Proof of Lemma B.3. First, consider the term Un. For i,j = 1, • • • ,n — 1, denote 

ejj -1 = g\X]p) + ^\Xjf3)Xi+i,eij^2 = + B^Wj+i - e^^i, 

ei,3 = g{zi) + ^{zi)Xi+i + {g'{zi) + ^\zi)Xi+i){fXf^{Xj/3)f{P-p^), 

^ijA — ^ 1 ^^,3* 
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(a). Consider the main term U^i. Note that 


- e,,3 = e^+i + (g'(^J/3) + ^'(Xj/3)X,+i) {X, - P)f {(3 - 3i) + 0(5^^). 

Analogous to Lemma A.2 of Xia, Tong and Li (2002), it follows that 

^ n—1 

2 ^ ^ — Un Oa.sX^^nS'^ )• 

^ Pn . . 1 

Similarly, we obtain that 

— X,,e\, is'iXjp) + $'(x7/3)X,+i) (X, - f,f,{Xj(3)f w,j0,) = + Oa.s.{Sin + 5- ). 

n Pn . . 

*J=1 

Hence, we approximate the term \Jni as 

U„1 = + Vp(/3 - 3i) + Oa.s.{hn5^^ + 

(b). With the help of asymptotic representation of ^j{z) and empirical approximation theories, 
we can show that 


U„fc = Oa.s. {hS^^ + h-^ 62 nS^^ + +Sn),k = 2, 3,4. 

(c). In the similar fashion, we can also show that 

= 2Vp + Oa.s. (^3^ +h + h-^S2n) ■ 

Therefore, 132 — Pi = 2“^(/3 — Pi) + 2“^V~^Un + Rn, which means that 

P2-P=\(Pl-(^) + \^P^n + nn. (A.2) 

This completes the proof. 

Proof of Theorem 1 (I). First, by Lemma B.3, for the m—th step (m > 1), we have 

Pm+l ~ P — 2 ^Pm ~ P) 3- p Un + Rn,m) (A.3) 

where ||Rn,m|| < M ^6^ {h + h~^S 2 n + ) + Sn + h52n + and 

||V~^U,i|| < M52n a.s., with some large positive constant M. Here we take M > 1 and /i < 1 for 
sufficiently large n. Note that as n —)■ oo, the bandwidth h satisfies /i —)■ 0, h~^52n —^ 0, Snh~^ —)■ 0 
and h~‘^S 2 n —^ 0. We can assume that 

h + h-H2n < (8M)-\ M{5n + Man + h-Hl^) + M52n < i32M)-^h. 
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Then, if < (8M) ^h, then 5/3^^^ < (8M) and 

^f3m+l — + h52n + h + M52n- 

Note that we can choose the initial estimator (3i which satisfies ||5/3j| < {8M)~^h for sufficiently 
large n. Therefore, 

+ 4 “I + ^4^ I + hS2n + h ^ 62 n + <J2n)| • 

Taking m —)• oo, it follows that the final estimator /3 satisfies (5^ = ||/3—/3|| = Oa.s. (5n+(^2n+/i~^<5|„) 
and hence ||R„,oo|| = Oa.s. {h^ + . It also follows from the expression (A.3) that 

P {Il3 - /3 - v;iUn|| >C{h^ + <5L) } < o . 

This completes the proof of Theorem 1(1). 

Proof of Theorem 1 (II) and (III). Lemma B.2 tells us that, for f = 1, • • • ,pm 

deiz) -9e{z) = e^{nhAz',d)}~^H:[^X^{z]P)Wh^{z]]3)ei + ^fi 2 hlgi{z) +R„(z), 

where ei = (1,0,--- ,0)'^, Hi = diag(lix(q+i),/rilix(g+i)) and 

p| sup|R„( 2 ;)| > C{h^ + + = O • 

for some constant C > 0. 

(a). Consider the term n/ij( 2 :;b). Following the proof of Theorem 5.3 in Fan and Yao (2003), 
we have that there exists a large C > 0 such that 


P 


sup 

(b,^)G©x.E 


^hi{z;h) - P|fl/ij(2:;b)| 


> Cdsn C < O 




Let n(2;; b) = lim„^oo (^^i b)|. Note that n (z; b) = ^(z; b) + 0(/i.i) and 11(2;; b) 

is positive definite. Therefore, ll/i^(2;;b) is positive definite almost surely and 


P 


sup 

{b,^)e©xz 


-llh^(2;;b) - fl{z-,h) 


n 




(b). By Lemma B.l, we have 


pi sup sup ||-iLiA'^(2;;b)lF/ij(2:;b)e£||2 > C'(53n ) < O 
yi<£<p„ (b,2)G©x2 ^ / 


1+6^ 


n 


Therefore, combining (a) and (b), there exists a large C > 0 such that 

P jsup ||g(z) - g(z)||^ > C{hl + (53n)| < o • 

This completes the proof of Theorem 1(11). Theorem l(III) can be proven analogously. 
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Appendix C: Proof of Theorem 1 (IV) 

Before we prove Theorem 1(IV), we first give the convergence rate of the difference between the 
estimated residual et and the true residual ej. 

Lemma C.l. Suppose that Assumptions (Al)-(A5), (B1)-(B4) and (Cl) and (C3) in Appendix 
A hold. Then there exists C > 0 and small e > 0 such that 

P {sup \\et - etIL >C{hl + Ssn) | < O • 

Proof of Lemma C.l. For each t = 2, - ■ ■ ,n, 

2t-et= %{Xt3) - g(vT_,/3) + - ^{Xj_,(3)) Xt. 

Note that s{X^_3) - g(Aj_i/3) = g'{X^_^P*)X^_^3 - /3) + g{Xj_3) - g{X^_3), and 

{^{Xj_3)-^{xU(3))Xt = ^'{Xj_,^*)XtXj_S-P) 

+mxj_3) - ^iXj_3))Xt. 


Hence, there exists a large constant C > 0 such that 

W^t - etiloo < sup ||g(z) - g(z)||oo + sup ||$( 2 ;) - ^( 2 ;)||oo + ^113 - /3||, 

where sup^g^ (||g^(' 2 )||oo + ||^^(•^)||oo) = 0(1) is used in the last terms. For any u > 0, we have the 
following inequality 

p| sup |Tt - Cil > 3u| < p|||3-/3oII >u/o|+p|sup||g(z)-g( 2 :)||^ >u 

h 


+P <1 sup 
Izez 


^(z) - ^(z) 


> V > . 


Take v = C{h\ + ^ 3 ^) for a large constant C > 0. It follows from parts (II) and (III) of Theorem 1 
that there exists a constant C > 0 such that 

P[ sup \\et-et\\^> C {hl + Ssn)} <0 

This completes the proof of Lemma C.l. 

Now we are going to prove Theorem 1(IV). Define the quasi log-likelihood function 

n ^2 

Qe,n{^) = 


t=i 


where af^{6) is the solution of 




i=l 


2 = 1 
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For convenience, denote the true value of 0^ by 9^^. First, we consider the consistency of Recall 
that the observed quasi log likelihood function 

n J.2 

Qe,n{9) = + log al^{e), 

t=i 

where aj^{6) is defined in Section 2. Following the proof of Theorem 7.1 in Francq and Zokoian 
(2009), we shall establish the following results: 

(al) supi<^<p„ supegA l<5qn(0) - Ql,n{9)\ 0, a.s., as n oo; 

(a2) If there exists some t such that a‘j^{6) = in then 6 = 

(a3) ^nd if 0 / 0£_o, Ee^ A^(.A^)\ > 

(a4) For any 0 A 9^^, there exists a neighbourhood U{9) such that 


lim inf inf Qe,2{9) > Eq ve^2{9e,o)y(^-^- 
n^ooe*&u{e) 


By the proof of Theorem 7.1 in Francq and Zokoian (2009), we only need to prove (al). Denote 

( ao + E” . \ 

0 




mAQ) = 




\ f^|,i_^+i(0) j 


,^A^) = 


V 


J 


/ ^1 ^2 ••• 7s ^ 


, = 


1 0 


V 


1 0 


We have the relationship The condition (B2) and the compactness of A 

implies that p = sup^^^V P(^i) < 1) where p(B) means the spectral radius of B. Furthermore, 
can be expressed as 

t-i 

^ ^e^,t-k + B^ct^ 0- 


k=0 


Let ^^^(0) be the vector obtained by replacing aj^_^{9) by a‘j^_^{9) in g^t{9), and let be the 
vector obtained by replacing by and • • • , r^ 2 -m hy the initial values. Then we have 


t-i 


— X/ ^e^,t-k + B^CT^ Q. 


k=0 


Denote A = sup^<,^ Then, if t > m + 1, 




m 


i=i 


< (i| + ‘2di 

i=i 
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As a result, for t > m + 1, we obtain that 




^ ^ ^ei^e,t-k ~ Qie,t-k) + ^ ^ei^,t-k ~ ^e,t 


I k=t—m-\-2 


+ Q - (; 


t—1 m 


< C- + (if ^ ^ aj +/9 *||ct£^o “ ^£,o)ll > 


k=0 j=l 


for some constant C > 0. We thus have 


sup lQe,n(0) - Qe,n(^)l < n ^^sup 


~2 2 
pi2 ^2 


££,t + log 


=2 0eA 11 


< - C ■ (dj + di + n p^ejA + — -C-n 


where ai = inf^^^ |a£,o|- Note that de < C ■ (/if + (53^), a.s. and supf<p^ < oo implies that 
j —)■ 0,a.s. Then sup 3 <f<p^ ®^P0eA \Qi,n{^) ~ Qi,n{(^)\ —^ 0,a.s., and part (a) follows. 

Next, we consider the convergence rate of sup 3 <f<p^ ||0f — 0£,oll- The proof of this part is based 
on a standard Taylor expansion of Qe,ni^) ^£,o- Since 0f converges to 0f^O) which lies in the 
interior of the parameter space, we thus have 


0 = n ^Y1 


-1 dvi^ti^e) 

^ 50 


_ -1 dv£^t{f^e,o) ( 1 5^Uf^t(0£) 


(0£ - 0£,o)) 


where 0| is between 0£ and 0f^o- Suppose we have shown that there exist two positive constants 
Cl and C 2 such that 


P i sup - > --- 

[i<f<p„ do 


>Ci{hl+S3n) }=0 


Denote 


p\ inf inf A„i., 

1 i<«<Pn 0ey(0o) 5050^ 


<nC2> =0 


An = s inf inf Aj, 

1 1<£<P„ 6eV{Oe^o) 


-1 Y^ 9‘^U£t(0) 

n > - 

5050^ 




where C 2 is defined in (A.5). Then, for each x > 0, 

p\ sup 0f-0f,o >x|<p| sup ^ 


1<£<P 


l<t<Pn \\^2 


5^^£,t(0£,o) 

50 


> nC2x\ + P{An)- (A.6) 
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It is not hard to see that (A.4) can be proved from (bl) and (b2) and (A.5) follows from (b3)-(b5). 
We now prove them separately. 

(bl). It is easy to show that 

dve^tiO) _ f 4,t \ / 1 dal^{e)\ 

de \ aliO) J dO J 

and 


dvit(^£,o) 

^ de 


d 

< OO. 
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Note that t < n} are strictly stationary and a—mixing with geometric rate. (Also see Lindner 
(2009).) It follows from Theorem 2 (ii) of Liu, Xiao and Wu (2013) that, there exist positive 
constants Ci, C 2 and C 3 such that for all x > 0, 




dve 4 ^efl) 

de 


> ^ M TTT + ^2 exp - 


nV2 j • 


Hence, by taking x = C 62 n for a large constant C > 0, we obtain that 


P < sup ^ 

[ i<£<Pn ^_2 


90 


- c4^og(n))^^% ^ 


< O 


(b2). Similar to (al) in this proof, we have that 


daj/e) daj^ie) 

de de 


t —1 m 


< C{dj + de T.f"!: \et-k-j \ + P^)- 


k =0 j=l 


We also obtain that 


1 1 




^£,t ~2 < C{de + de + p ), <l + C{de + de + p). 

^ 0 + ^0 + ^0 + 


As a result, for i = 1, - ■ ■ ,m + s + l, the i-th component of the difference 
bounded above by 

\dve,t(ee,o) dve,t{ee,o)\ 


\dvi^t{ei,o) I ■ 


^e,t ^e,t 




1 


rh de 


< + 1-^ he,-^ 


1 \ 




+ 1-d 


1 / 9aet 




erd \ dOi dOi 


i^£,o) + 


dj + de\ee,t\ 1 daj^ 


-2“7^(^Lo) 


< C{de + d£ + p*)(l + pj^t) 1 + 


1 9cr|d^Lo) 
a^d^Lo) dOi 


Then it follows that, for i = 1, • • • , m + s + 1, 


dve^t{(^e,o) dve^t{^e,o) 
^ Wi ^ Mi 

t =2 * t =2 * 


C(dj + de)'^^{l + Pe,t) 1 + 


p^{I + Pe,t) 1 + 


1 daj^ieefl) 
5|t(0£,o) d9i 
1 ddj/eefi) I 


aj^{eefl) d9i 


By Markov and bulkholder inequalities for martingales, we claim that there exists a constant C > 0 


such that 


P\ sup ^d(l + 0!t) 1 + 


£<t<Pn ^2 


1 

s|,,(e/,o) ds. 


32 



and 


P< sup '^{l+7jlt 

[ l<^<Pn j_2 


1 + 


1 dal^iOefl) 




dOi 


>071^=0 


l+£ 


n 


Note that sup£<p^ \de\ = Oa.s(^i + <^ 3 n)- Hence, it follows that there exists a constant C > 0 such 
that 


-1 


P < sup n 

[ l<£<Pn 

and part (b2) follows. 


dv£^t{diQ) 


t=2 


t=2 


de 


> C{hj + 53n) } = O 


1 


!+£■ 


n 


(b3). n ^ Y1^=2 ^ can be expressed as 

dOdO 


-1 _ -1 A f d^ViA^efl) p /^ f 

” ^ ^ I V 9090^ y J ^ I dede^ 


Note that infi<£<p„ E 
c > 0, 


f 9 ^<,f(^co) 1 -g pQgi|;^yg (definite. It suffices to show that, for any constant 


t 9090" 


P 


sup n 
i<e<Pn 


22 r d^ve^t{Oe,o) _ ^ / d‘^veA^ifi) \ 

^ 1 deae^ V oeae^ ) 




Similar to (bl), we claim that there exist three positive constants Ci, C2 and C3 such that 


P < sup n 

I l<^<Pn 


-1 


E 

t=2 


d^veA^e,o) 

aeae^ 


- E 


<Ci 


VnU 

{ncY 


+ C'2Pnexp(-C'3n^c^) = O 


aeae^ 

1 


> c 


l+£^ 


n 


Part (b3) follows. 

(b4) and (b5). Together with the proof of (c) in Theorem 7.2 of Francq and Zakoian(2011), 
the proofs of these two parts can be proved in a similar fashion to (b2) and (b3). 


Appendix D: Proof of Theorem 2 

Define E, = S(A^3) - $(A^/3), The difference c5v{Yn+i\Pn) - 

cov(l^+i|Tn) can be decomposed into four parts; 

co'v(y„+i|J-„) - cov(F„+i|T-„) = + $(AT/3)F„{$(AT/3)f + (So.n - 5]o,n) 

+ [^iXlP)^AXn)El + E„S,(A„) {$(AT/3)f) . 
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We thus bound ||cov(y;i+i| J'n) - cov(y„+i| J'„)|||; by 

4 ' +4 HXlP)FnmXlp)}^ ' +4 So,n-So,„ " 

+4 ^{Xll3)^^{Xn)K + ^n^x{Xn){^{XlP)}^ ' . 

To bound these terms, we first introduce the following two lemmas. 

Lemma D.l. Suppose that Assumptions (Al)-(A5), (B1)-(B4) and (C1)-(C4) in Appendix A 
hold. Then there exists a large C > 0 such that 


E„ >Cpn{hi + 


log(n) 


7.1+^ / 


P F,, >C hU 


log(n) 


Proof of Lemma D.l. (i) Observe that 


>{X3) - ^{Xl(3) = ^'{Xl(3*)Xl0 -P)+ {${XlP) - ^{Xl^)) , 


where f3* is between /3 and (3. As a result, 


llEnlll < 2|| sup$'(z)||i • ||A„f • 11/3 - /3||i + 2 • sup $(z) - $(z) . 

z&z'' 11^’ 

Note that || sup^g^ = 0{pn)- Therefore, part (i) follows from Theorem 1(1) and 

(HI). 

(ii) Let A'/j 2 ^t(u) = Kh^{Xt-i — u) and p{Xt) be a bounded function uniformly over Xt G X. 
By following the proof of Theorem 5.3 in Fan and Yao (2003), we can see that there exists a large 
C > 0 such that 


^ ^V^(-^t)-^fe2,t(u) - E|(/)(Yt)A:/i2,t(u)| > 


. 

\ / 


By setting p{Xt) = 1, Xj,XjXk, {j, k = 1, ■ ■ ■ , q), part (ii) follows. 

Lemma D.2. Suppose that Assumptions (A1)-(A5), (B1)-(B4) and (Cl) and (C3) in Appendix 
A hold. Then there exists C > 0 and small e > 0 such that 


P < sup 

I l<e<Pn 


•^2 2 ! 
^e,n+i ~ ^e,n+i 


>C{ftf + i,„)|<o(^). 


Proof of Lemma D.2. Let B(i,j) be the (i,j)th element of the matrix B and A(/) be the 
ith entry of a vector A. The conditional covariance expressed as 

n 5 

_2 V—^ '^h _ . ^> —TL+l _2 

^i,n+i = (1, l)c£ ,^+i_fc(l) + 2_^ (1, i)gii Q{i), 

fc=0 i=l 
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where is the matrix obtained by replacing by in and ^ and ^ g are defined 
accordingly. Note that the true conditional variance 

n s 

^e,n+l — X/ ^ i)^^o(f). 

fc=0 i=l 

We thus have that 

n n 

^e,n+i ~ (^e,n+i = ^ (1,1) (c^ ^ ^B^ - Bf^ (1, 

k=0 k=l 

s 

+ (1) “ ^£'*'^(1) ~ 
i=l 

(a) Consider the term Ue^i and observe that ||c£^^ — J| < IS^^g — a^^gl + dj + 2(4 \^e,t-j\ ■ 

Then, there exists a constant C > 0 such that 


|t4,i| < C(|«£,o ~ CK£,ol + dj + d£ 

j=i fc=i 

Since X]fc=i lQ,t-A:-j| /<7£n+i bounded and > a^^g > 0, this means that 


Ue,i 


a 


£,n+l 


< C'(|S£,o — a£,o| + <4)- 


and consequently, there exists a large constant C > 0 such that 


P < sup 

I l<£<Pn 


Ut^i 


O': 


£,n+l 


> C{hl + <53n) ) = o 


n 


!+£■ 


(b) Consider the term Ui^ 2 - Denote 4 = sup;^<j<^ — 7£,i|/7£,i. By the definition of B^ and 

B, it is seen that 


B,^l,l)-B,"(l,l) 

Bt(l,l) 


< max{|(l — 5i)^ 


1|, 1(1+ (5£)'=-1|} <24^(1+ ,5£)'^-\ 


for small S^. Note that > a^^g + B£(l, l)c£ and the relation x/{l + x) < for all 

X > 0 and 6 £ (0,1). We have that 


D£,2 

n 

< V 

(bJ-b^) (1,1) 

B^(l, l)c£ ,j_,_i_fc(l) 

2 

'^£,n+l 

- 

k=l 

1 

B"(l,l) 

«£,0 + B£(1, l)c£ ,^_,_l_fc(l) 


n 

< 2,5£j;Ml + 5£)V''c,Vi-fc(l)- 

k=l 
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Hence, by choosing a suitable but small 5, it follows from Theorem 1(IV) that there exists a large 
positive constant C such that 


p\ 

f 

sup 

Ue,2 

^2 

1 

[l<l<Pn 

^e,n+l 


> C{hl + Ssn) 


< P 


sup > C{hl + dsn) 
i<e<Pn 



(c) It is easy to see that \\U^^z \\/is bounded. Lemma D.2 follows. 


Proof of Theorem 2. 

(a) . Now we bound . Observe that 

s 

Pn E„5]a;(A^n)E„ ^ (cOv(l^+ljT'n) ) A^gg^ E^ ^ ■ 

Hence, it follows from Lemma D.l that there exists C > 0 such that 

F {||e„e.(x„)eJ i; > Cp. [kf + } = o (^) . 

(b) . We bound $(Af^/3)F„{$(Af^/3)}T " . Notethat ||$(Af^/3)T (cov(y„+i|T-„))'i $(Af^/3)|| = 

^ 2 _ 

0(1). Hence, we have that < 0(p“^)||F„|||,, and consequently, by 

Lemma D.l, there exists O > 0 such that 

p|||4.(xT/J)F„{^(Xl/3)f > Cp;‘ (hi + } = ° (;?) ■ 

^ 2 

(c) . We bound Eo,n — ^o,n ■ Note that 

2 2 

Eo,n - So,n < COv(Fn+l|-T„)“^/^ (So,n - So,n) COv(yn+l|-Tn)“^'^^ 

S V / 2 

^2 2 ^ 

. ^e,n+l ~ ^e,n+l 

< sup - 2 - • 

l<e<Pn '^i,n+l 

Hence we obtain from Lemma D.2 that there exists O > 0 such that 


En.n — 5]r 


>ci hi + 


log(n) 


(d). Now we bound + E„Sa;(X„){$(X)^/3)}'^ . Note that for two q x q 


matrix A and B, ||A+B|||. < 2(||A|||. + ||B| 
We have that 


< IIA||ir||B||j7’ and |tr(AB)| < ||A||i?||B| 


Pn ^{XlP)^,iXn)^n + ^n^x{Xn)mXlP)f 

<2||cov(y„+i|T-„)-i/2^(xT/3)Ex(x)ETcov(y„+i|T-„)-i/2||2^ 

= 2tr (^^^{Xn)^nCOY{Yn+l\Pn)-^^n^x{Xn)mXlp)}^COY{Yn+l\Pn)-^MXlp)^ 

< 2g2||S,(X„)|||A„,gx (cov(y„+i|T-„)-i) A^^gx ({$(A)^/3)f cov(y„+i|y„)"i$(AT^)) • ||E„|||. 
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Hence, by Lemma D.l , together with Amax({^(-^n/5)}'^cov(y„+i| ^$(X^/3)) = 0(1), it follows 

that there exists O > 0 such that 


P 


^{Xl(3)T,^{Xn)K + P-nX.Mn){^{Xll3)f > C { hf + 


logn 

n/ii 


< O 


!+£■ 


n 


Combining (a)-(d), Theorem 2 follows. This completes the proof of Theorem 2. 
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